Smaller stack usage for SHA-1, SHA-256 and SHA-512.#709
Smaller stack usage for SHA-1, SHA-256 and SHA-512.#709MarekKnapek wants to merge 2 commits intolibtom:developfrom
Conversation
sjaeckel
left a comment
There was a problem hiding this comment.
That looks interesting. Thanks for the next PR :)
When looking at it it seems like we'd be trading computation in space to computation in time, meaning that the execution should be slower after the patch applied.
So I modified the timing demo a bit to show something relevant, and the before looks as follows:
sha512 : Process at 39
sha512-256 : Process at 39
sha384 : Process at 39
sha512-224 : Process at 39
sha1 : Process at 61
sha256 : Process at 122
sha224 : Process at 122
vs. after this patch applied:
sha512 : Process at 39
sha384 : Process at 40
sha512-256 : Process at 40
sha512-224 : Process at 40
sha1 : Process at 68
sha224 : Process at 106
sha256 : Process at 106
sha1 really got worse, sha512-based stayed more or less the same (maybe a little bit slower), but sha256-based got significantly better performance!?
Not sure what to do with sha1, maybe enable this patch via a new LTC_SMALL_STACK option?
The other two I'd simply take unconditionally.
What do you think?
|
I think my next PR will be about x86 (and amd64) specific intrinsics. Making the SHA-1, SHA-256 and SHA-512 much, much faster.
|
* Add the option to only run for a subset of algos. * Improve `hash` to show something meaningful. Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
That's the timinig demo in
That depends on how you build the library. I usually simply run If you use CMake (and build in a folder inside the ltc folder) it'd be
Those previous tests were done with the standard config. With Before the patch: After the patch: So it seems like your patch improves the performance in the default case ( In the case FYI:
OK, that sounds nice. You're also thinking about adding |
|
My performance measurements are different. Maybe it depends on processor cache size, branch prediction buffer size and many other things. Before: After: My command line was: Another measurement, this time with Before: After: Here I was able to improve SHA-1 But it is still slower than |
For sure, since you most likely have a different CPU. But the differences of the algorithm classes themselves are comparable and my statement from above:
is thereby validated.
Absolutely.
FYI: lower value = faster, the number shown is "the number of CPU cycles per iteration" -> i.e. by having it changed from 110 to 121 you made it 10% slower :-D
No need to run all these, especially not To speed your local development cycle up I'd suggest you to run And I run |
Checklist